b'Office of the Inspector General\nSkip to content\nSocial Security Online\nOffice of the Inspector General\nwww.socialsecurity.gov\nHome\xc2\xa0\xc2\xa0|\xc2\xa0\xc2\xa0Questions?\xc2\xa0\xc2\xa0|\xc2\xa0\xc2\xa0Contact\nUs\nSearch\nAbout\nHotline\nOffices\nResources\nEspa\xc3\xb1ol\nOIG\nHome\nAudit\nReport - A-13-97-12014\nOffice of\nAudit\nEvaluation of the Social Security Administration`s\nBack-up and Recovery Testing of Its Automated Systems - A-13-97-12014\n- 9/24/97\nTABLE OF CONTENTS\nEXECUTIVE SUMMARY\nINTRODUCTION\nRESULTS OF REVIEW\nFURTHER\nIMPROVEMENTS ARE NEEDED TO STRENGTHEN SSA\xc2\x92S OVERALL RECOVERY\nTESTING PROCESS\nThe\nApplication Test Objectives Were Not Completed\nOnly\n6 of the 12 Critical Workload Areas Have Been Tested to Date\nat COMDISCO\nNo\nDocumented Performance Standards Exist to Measure Stress Test\nResults\nDifficulties\nin Establishing the Support Environment and Incompatibilities\nbetween Different Facility Complexes Prevented the Successful\nCompletion of the MTAS Workload\nTapes\nSent to COMDISCO from the OSSF May Not Include All Critical Workloads\nCONCLUSION AND RECOMMENDATIONS\nAPPENDICES\nAppendix A - Critical Workloads\nAppendix C - Major Contributors to this\nReport\nBack to top\nEXECUTIVE\nSUMMARY\nOBJECTIVE\nThe objective of this audit was to observe and evaluate the testing\nof the Social Security Administration`s (SSA) back-up and recovery\nplan (BRP) conducted at COMDISCO from February 28 through March 2,\n1997.\nBACKGROUND\nSSA is required by the Office of Management and Budget (OMB) Circular\nA-130 to have in place a disaster recovery plan for its automated\nsystems. The recovery plan should be fully documented and periodically\ntested. This report is the second report that focuses on SSA`s\ndisaster recovery planning. In the first report, we reviewed the\nBRP document and other related areas and concluded that SSA had made\nsignificant improvements since our prior review in 1984. We also\nfound that SSA generally was in compliance with OMB Circular A-130.\nIn this current report, we address the periodic testing of the BRP.\nSSA operationally tests its recovery plan every 12 to 18 months.\nThe March 1997 test was the fourth opportunity SSA has had to test\nits on-line environment at COMDISCO in North Bergen, New Jersey.\nCOMDISCO is a commercial recovery facility vendor that SSA has contracted\nwith to provide its disaster recovery support.\nThe primary objectives for this test were to re-establish data processing\nand network environments, and test the functionality of a limited\nnumber of on-line and batch environment applications. One field office\n(FO) from each of six SSA regions participated in this test. The\napplications tested were to:\nProcess initial title II claims through the Modernized Claims\nSystem (MCS) and then run the batch jobs at night.\nProcess some critical payments through the Critical Payment System\n(CPS) and run the batch jobs at night.\nProcess payroll through the Management Time and Attendance System\n(MTAS).\nPerform some on-line query responses.\nIn addition to the applications being tested, SSA performed testing\nof the network to determine at what point the network began experiencing\nsignificant processing delays.\nRESULTS OF REVIEW\nThe March 1997 recovery testing at COMDISCO did not meet all of\nits objectives. We believe the testing could have been more successful\nif more time had been made available to solve the start-up problems.\nFurther improvements are needed to strengthen SSA`s overall\nrecovery testing process. Specifically, we found that:\nTHE APPLICATION TEST OBJECTIVES WERE NOT COMPLETED;\nONLY 6 OF THE 12 CRITICAL WORKLOAD AREAS HAVE BEEN TESTED TO\nDATE AT COMDISCO;\nNO DOCUMENTED PERFORMANCE STANDARDS EXIST TO MEASURE STRESS TEST\nRESULTS;\nDIFFICULTIES IN ESTABLISHING THE SUPPORT ENVIRONMENT AND\nINCOMPATIBILITIES BETWEEN THE MANAGEMENT INFORMATION SERVICE\nFACILITY (MISF) AND THE PROGRAMMATIC PROCESSING FACILITY (PPF)\nCOMPLEXES PREVENTED THE SUCCESSFUL COMPLETION OF THE MTAS WORKLOAD;\nAND\nTAPES SENT TO COMDISCO FROM THE OFF-SITE STORAGE FACILITY (OSSF)\nMAY NOT INCLUDE CRITICAL FILES.\nRECOMMENDATIONS\nTo improve the recovery testing process, we recommend that SSA:\nConsider increasing the test time at COMDISCO from 3 to 4 days\nto allow more time for technicians to solve start-up problems.\nDevelop a master application test plan so that all critical workloads\nare tested on a cyclical basis. Plan to test the critical workload\nareas that have not yet been tested. Increase the number of applications\nbeing tested for the next test date.\nDevelop performance standards\nand certify stress test results.\nCorrect the incompatibility problem to assure that all non-PPF\nworkloads will run at COMDISCO.\nContinue to automate the back-up tape pick list process to select\ntapes that are critical to the test.\nSSA COMMENTS\nSSA agreed with our report recommendations to strengthen the back-up\nand recovery testing at COMDISCO. Appendix B of this report includes\na copy of SSA\xc2\x92s comments to our report.\nBack to top\nINTRODUCTION\nOBJECTIVE\nThe objective of this audit was to observe and evaluate the testing\nof SSA`s BRP conducted at COMDISCO from February 28 through March\n2, 1997.\nBACKGROUND\nOn June 29, 1993, SSA contracted with COMDISCO in North Bergen,\nNew Jersey to provide SSA`s recovery support. The contract was\namended in November 1995 to include a COMDISCO satellite location\nin Columbia, Maryland which would be used by SSA for test monitoring.\nThe March 1997 test was the fourth on-line testing opportunity SSA\nhas had at COMDISCO. SSA ran its first test on December 12-14, 1993.\nAs with any initial exercise, SSA had a few problems when starting\nup. SSA gained more experience and expanded the test objective for\nthe August 12-14, 1994 test to include submitting transactions on-line\nfrom various FOs directly to COMDISCO. This test was successful according\nto SSA and demonstrated its ability to re-establish the full functionality\nof the Agency`s mission to resume critical operations at an alternate\nsite. A test had been scheduled for June 2-4, 1995 but, because SSA\nrequested more direct access storage device (DASD) memory than the\ncontract called for, the test had to be postponed until January 1996\nwhen more DASD became available at COMDISCO. The January 26-28, 1996\ntest was expanded to include not only the COMDISCO recovery facility\nin North Bergen, New Jersey, but also the satellite site in Columbia,\nMaryland for test monitoring by SSA.\nFor the March 1997 test, there were 34 recovery team members at\nNorth Bergen and 21 members at Columbia. The primary objectives for\nthis test were to re-establish the data processing and network environments,\nand test the functionality of a limited number of on-line and batch\nenvironment applications. One FO from each of the six regions participated\nin this test. The applications tested were to:\nProcess initial title II claims through the MCS and then run\nthe batch jobs at night.\nProcess some critical payments through CPS and run the batch\njobs at night.\nProcess payroll through MTAS.\nPerform some on-line query responses.\nIn addition to the applications being tested, SSA`s Division\nof Integration and Environmental Testing (DIET) performed stress\ntesting of the network to determine at which point the network began\nexperiencing significant processing delays.\nSCOPE AND METHODOLOGY\nWe used several methods to gather evidence for our audit. We reviewed:\n1. relevant documents, e.g., previous studies by the Office of\nthe Inspector General and others;\n2. SSA`s January 31, 1996\nBRP document; and\n3.              SSA`s recovery test results documents\nfor tests conducted at COMDISCO during December 1993, August 1994,\nand January 1996.\nWe observed the February 28 through March 2, 1997 recovery testing\nat the COMDISCO facility in North Bergen, New Jersey and interviewed\nSSA personnel who were at North Bergen, New Jersey and at the satellite\nlocation in Columbia, Maryland. Field work was performed at SSA Headquarters\nin Baltimore, Maryland and at COMDISCO in North Bergen, New Jersey\nand Columbia, Maryland between March 1997 and April 1997. Our audit\nwas performed in accordance with generally accepted government auditing\nstandards.\nBack to top\nRESULTS\nOF REVIEW\nFURTHER\nIMPROVEMENTS ARE NEEDED TO STRENGTHEN SSA`S OVERALL RECOVERY TESTING\nPROCESS\nThe March 1997 recovery testing at COMDISCO did not meet all of\nits testing objectives. The disaster recovery team (DRT) was able\nto re-establish the data processing and network environments; however,\nthey were unable to complete the on-line and batch application testing\nwith the FOs. We believe that if the DRT had more time they could\nhave completed more objectives. Improvements are needed to strengthen\nSSA`s overall recovery testing process.\nThe\nApplication Test Objectives Were Not Completed\nThis test produced several new circumstances described below which\nresulted in an unstable operating environment when the applications\nwere being tested on Saturday, March 1. The unstable operating environment\nwas the result of the DRT not having enough time to resolve operating\nand application start-up problems which were caused by the following\nfactors:\nmissing data files;\nnew release versions of several support software products being\nintroduced at the same time;\ninexperience of new personnel; and\nnew hardware.\nIf the DRT had more time up front to solve the start-up problems,\nwe believe most of the test applications could have been successfully\ncompleted on March 1. SSA`s dynamic data processing and application\nenvironments are becoming more complex each year. Given these complexities\nand interdependencies, we believe that regardless of the extent of\nplanning by the DRT there will always be the risk of unanticipated\nstart-up problems.\nThe window of opportunity for testing on-line applications is on\nSaturdays when FOs are closed and the network can be switched over\nto COMDISCO. Late Saturday and Sunday is used to execute the batch\nsystems, perform on-line maintenance, and purge the system of SSA\ntest data. The DRT needs an additional 24 hours up-front (start Thursday\nat 8 a.m. rather than Friday at 8 a.m.) to resolve any operating\nstart-up problems so on-line application testing can begin on time\nearly Saturday morning.\nOnly\n6 of the 12 Critical Workload Areas Have Been Tested to Date\nat COMDISCO\nAfter four testing opportunities at COMDISCO (December 1993, August\n1994, January 1996, March 1997), only 6 of the 12 critical workload\nareas have been tested. Of the six areas that have been tested, only\nthe on-line queries, processing title XVI claims and MTEXT workloads\nhave been totally successful. There was also limited success in processing\npost entitlement events (for example, some applications have run\nsuccessfully while others have not.) See Appendix A for a list of\nthe 12 workload areas.\nWe believe the reason why only 6 workloads have been tested to-date\nis because of incomplete planning by SSA for testing all the applications\nin the 12 critical workload areas. Our conclusion is based on the\nfollowing points:\nSSA does not have a multi-year (master) application test scheduling\nplan to ensure that all critical workload areas are tested on a\ncyclical basis; i.e. every 3 years. According to SSA, each test\nplan stands on its own merit, which means the results from each\ntest have not been compiled for developing an overall application\ntesting plan schedule.\nIn our discussions with SSA, there were some inconsistencies\nwithin SSA components as to what the critical workloads were within\nthe 12 workload areas. The inconsistencies in defining the critical\nworkloads indicate that planning needs improvement. For example,\nwe noted inconsistencies in the latest BRP document dated January\n31, 1996 which identified the critical workloads. We questioned\nwhy the 800-number system to schedule appointments and referrals\nwas listed as a critical workload in Appendix F of the BRP but\nnot listed in the executive summary as a critical workload. One\nSSA component said it was a critical workload, while another said\nit was not. In another example, we inquired why the MTEXT workload\nwhich had been scheduled for the March 1997 test was canceled.\nThe reason given was because SSA now believes this workload is\nnot critical. Originally, it was believed that some new beneficiaries\nwould not get their checks unless the MTEXT notices were generated.\nBetter planning would have resulted in eliminating the MTEXT workload\nfrom the critical workload list.\nFor the March 1997 recovery test, one application test objective\nwas to process title II claims through the MCS. However, not all\ntitle II claims are processed through MCS. While all claims are\ninitiated through MCS, if MCS identifies exceptions (such as missing\nMaster Beneficiary Record data) the claim must then be processed\neither through the Claims Automated Process System (CAPS) or through\nthe Manual Adjustment Debit, Credit and Award Process (MADCAP).\nIn February 1997, MCS processed 70 percent of the claims, CAPS processed\n4 percent, and MADCAP processed 16 percent. Testing for only those\ntitle II claims that could be processed through MCS overlooks about\n30 percent of all title II claims.\nFinally, SSA only has the opportunity to test every 12 to 18 months\nat COMDISCO. Currently, SSA is testing between three and four applications\nper test date. Testing a larger number of applications would be more\nefficient.\nNo\nDocumented Performance Standards Exist to Measure Stress Test\nResults\nThe purpose of stress testing is to determine the volume of transactions\nat which the network would experience significant delays. These tests\nare designed to simulate how the system will perform under actual\nconditions with a high volume of transactions being processed at\none time. For this test, DIET officials said they were at about 350\ntransactions per second before the network began experiencing delays.\nIn comparison, we have been told that during the peak time for a\nnormal day, the National Computer Center (NCC) will process over\n900 transactions per second. However, the DIET stress test results\ncannot be measured since there are no documented performance standards.\nConsequently, the SSA officials that we talked with could not explain\nif this service performance level at COMDISCO would be acceptable\nin a disaster situation. The results are not meaningful unless they\ncan be measured against a stated service performance standard.\nAlso, for the March 1997 test, the results (350 transactions/second)\nthat were achieved were based only on log on/off and query only transaction\nprofiles. The profiles used for the test excluded those transactions\nthat would have resulted in an action to update a data base. Since\nthis was not representative of a typical daily production transaction\nmix at NCC, these stress results are even less meaningful. We were\ntold that not all transaction profiles could be used for this test\nbecause of some technical limitations.\nDifficulties\nin Establishing the Support Environment and Incompatibilities\nbetween Different Facility Complexes Prevented the Successful\nCompletion of the MTAS Workload\nMost of SSA`s critical workload applications run in the PPF\ncomplex environment; however, several applications run outside it.\nExamples of these applications include Falcon, PSC/OCRO batch, and\nMTAS which run in the MISF complex and VTAM and NETVIEW which reside\nin the Network Management Facility complex environment.\nFor the March 1997 test, SSA tested the MTAS application at COMDISCO.\nThis was the third time the time and attendance application did not\nmeet all of the test objectives. One reason for the problem is SSA\nhas attempted to execute an MISF application in the PPF environment.\nAccording to SSA officials, this presents a number of logistical\nand technical problems, such as record blocking lengths, which to\ndate has made the MTAS application incompatible in the PPF environment.\nAlso, because most of the other non-PPF critical workload applications\nhave not been tested to date, SSA has no assurance these applications\nwill work.\nTapes\nSent to COMDISCO from the OSSF May Not Include All Critical\nWorkloads\nFiles to be sent to COMDISCO from the OSSF currently are judgmentally\nselected from over 45,000 tapes at the OSSF. This process introduces\nhuman error since all critical tapes may not be selected, thus losing\nvaluable time in a disaster recovery situation. This condition occurred\nin the March 1997 test when several MTAS and IDMS files were missing.\nWhile SSA has made some improvements in the development of the back-up\ntape pick list, further automation of the process is still needed.\nThe recovery pick list should be automated since all the critical\nworkloads are known and all the files associated with these workloads\ncan be identified. The improvements that were made make the process\nmore flexible in that the pick list can be generated outside the\nSSA complex. Prior to this improvement, the tapes had to be selected\nby a person located in the NCC complex. The improvements permit the\nOffice of Systems Design and Development and the Office of Telecommunications\nand Systems Operations personnel to select tapes from a remote site\nusing a lap top computer and a modem.\nBack to top\nCONCLUSION\nAND RECOMMENDATIONS\nThe March 1997 testing at COMDISCO did not meet all of its objectives.\nWe believe the DRT could have been more successful if more time had\nbeen scheduled to resolve start-up problems. The DRT was able to\nre-establish the data processing and network environments; however,\nthey were unable to complete the on-line and batch application testing\nwith the FOs. Further improvements are needed to strengthen SSA`s\noverall recovery testing process. Specifically, we recommend that\nSSA:\nConsider increasing the test time at COMDISCO from 3 to 4 days\nto allow more time for technicians to solve start-up problems.\nDevelop a master application test plan so that all critical workloads\nare tested on a cyclical basis. This should include a list and\ndescription of all the workloads that would be done in each of\nthe 12 critical workload areas, when the workload was last tested,\nthe results, and when it is next scheduled for testing.\nPlan to test the critical workload areas that have not yet been\ntested. Also, for the next recovery test, SSA should increase the\nnumber of applications tested.\nDevelop performance standards and certify the DIET stress test\nresults.\nBenchmarking should be done at the NCC to establish an acceptable\nservice performance standard at COMDISCO.\nAnalyze the environmental incompatibility problem, determine\nthe best approach, and implement appropriate corrective action\nto assure that all non-PPF workloads will run at COMDISCO.\nContinue planning to automate the back-up tape pick list process\nto select tapes. The personnel at the OSSF should be able to execute\nan inventory selection program that would automatically generate\nthe back-up tape pick list.\nSSA COMMENTS\nSSA agreed with all recommendations and informed us that corrective\nactions are being taken.\nBack to top\nAPPENDICES\nAPPENDIX A\nThe following critical workloads were identified from page 8 of\nthe Social Security Administration`s January 31, 1996 back-up\nand recovery plan for the National Computer Center.\nCRITICAL WORKLOADS\n1. Claims, where payment is due within 30 days.\n2. Earnings records for disability cases so that development can\nproceed on insured applicants.\n3. Critical payments.\n4. Process appeals with allowances.\n5. Stop work reports.\n6. Emergency enumeration requests.\n7. Time and attendance systems for payroll.\n8. Certification/accounting system for payments.\n9. Interactive direct input for postentitlement events which affect\npayment.\n10. On-line queries.\n11. Critical processing center workloads controlled by the Processing\nCenter Action Control System.\n12. Critical incomplete notices processed through MTEXT.\nTo assist in processing critical workloads, the following systems\nfacilities will be available:\nAdministrative-related support facilities such as\nTOP SECRET, NEWS, NETSTAT, and Model District Office informational\nreleases.\nThe Modernized OCRO System.\nFalcon Data Entry Software and Program Service Center\nWorkloads.\nElectronic mail, specifically cc-mail.\nBack to top\nAPPENDIX\nC\nMAJOR CONTRIBUTORS TO THIS REPORT\nOffice of the Inspector General\nScott Patterson, Director, Evaluations and Technical Services\nBruce Daugherty, Audit Manager\nRandy Townsley, Senior Auditor\nPrivacy Policy | Website\nPolicies & Other Important Information\xc2\xa0| Site\nMap\nNeed Larger Text?\nLast reviewed or modified'